Exploiting Syntactic Structure for Natural Language Modeling

نویسنده

  • Ciprian Chelba
چکیده

The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood reestimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal, Switchboard and Broadcast News corpora show improvement in both perplexity and word error rate | word lattice rescoring | over the standard 3-gram language model. The signi cance of the thesis lies in presenting an original approach to language modeling that uses the hierarchical | syntactic | structure in natural language to improve on current 3-gram modeling techniques for large vocabulary speech recognition. Advisor: Prof. Frederick Jelinek Readers: Prof. Frederick Jelinek and Prof. Michael Miller

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Syntactic Structure for Language Modeling

The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words–binary-parse-structure with headword annotation and operates in a left-to-right manner — therefore usable for automatic speech recognition. Th...

متن کامل

A Topic-Oriented Syntactic Component Extraction Model in Social Media

Topic-oriented understanding is to extract information from various language instances, which reflects the characteristics or trends of semantic information related to the topic via statistical analysis. The syntax analysis and modeling is the basis of such work. Traditional syntactic formalization approaches widely used in natural language understanding could not be simply applied to the text ...

متن کامل

Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors

Recent work in hierarchical priors for language modeling [MacKay and Peto, 1994, Teh, 2006, Goldwater et al., 2006] has shown significant advantages to Bayesian methods in NLP. But the issue of sparse conditioning contexts is ubiquitous in NLP, and these smoothing ideas can be applied more broadly to extend the reach of Bayesian modeling in natural language. For example, a useful representation...

متن کامل

Chinese Textual Entailment Recognition Based on Syntactic Tree Clipping

Textual entailment has been proposed as a unifying generic framework for modeling language variability and semantic inference in different Natural Language Processing (NLP) tasks. This paper presents a novel statistical method for recognizing Chinese textual entailment in which lexical, syntactic with semantic matching features are combined together. In order to solve the problems of syntactic ...

متن کامل

Title of dissertation : DECISION TREE - BASED SYNTACTIC LANGUAGE MODELING

Title of dissertation: DECISION TREE-BASED SYNTACTIC LANGUAGE MODELING Denis Filimonov, Doctor of Philosophy, 2011 Dissertation directed by: Dr. Mary Harper Department of Computer Science Dr. Philip Resnik Department of Linguistics Statistical Language Modeling is an integral part of many natural language processing applications, such as Automatic Speech Recognition (ASR) and Machine Translatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.CL/0001020  شماره 

صفحات  -

تاریخ انتشار 1998